自动显微镜和定量图像分析的进展已促进了高含量筛查(HCS)作为有效的药物发现和研究工具。尽管HCS提供了高吞吐量图像的复杂细胞表型,但该过程可能会受到图像畸变的阻碍,例如异常图像模糊,荧光团饱和度,碎屑,高噪声,高水平的噪声,意外的自动荧光或空的图像。尽管此问题在文献中受到了温和的关注,但忽略这些人工制品会严重阻碍下游图像处理任务,并阻碍对细微表型的发现。因此,在HCS中使用质量控制是主要问题,也是先决条件。在这项工作中,我们评估了不需要大量图像注释的深度学习选项,即可为此问题提供直接且易于使用的半监督学习解决方案。具体而言,我们比较了最近的自我监督和转移学习方法的功效,以提供高吞吐量伪像图像检测器的基础编码器。这项研究的结果表明,对于此任务,应首选转移学习方法,因为它们不仅在这里表现出色,而且具有不需要敏感的超参数设置或大量额外培训的优势。
translated by 谷歌翻译
语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译
Classical methods for acoustic scene mapping require the estimation of time difference of arrival (TDOA) between microphones. Unfortunately, TDOA estimation is very sensitive to reverberation and additive noise. We introduce an unsupervised data-driven approach that exploits the natural structure of the data. Our method builds upon local conformal autoencoders (LOCA) - an offline deep learning scheme for learning standardized data coordinates from measurements. Our experimental setup includes a microphone array that measures the transmitted sound source at multiple locations across the acoustic enclosure. We demonstrate that LOCA learns a representation that is isometric to the spatial locations of the microphones. The performance of our method is evaluated using a series of realistic simulations and compared with other dimensionality-reduction schemes. We further assess the influence of reverberation on the results of LOCA and show that it demonstrates considerable robustness.
translated by 谷歌翻译
Software Defect Prediction aims at predicting which software modules are the most probable to contain defects. The idea behind this approach is to save time during the development process by helping find bugs early. Defect Prediction models are based on historical data. Specifically, one can use data collected from past software distributions, or Versions, of the same target application under analysis. Defect Prediction based on past versions is called Cross Version Defect Prediction (CVDP). Traditionally, Static Code Metrics are used to predict defects. In this work, we use the Class Dependency Network (CDN) as another predictor for defects, combined with static code metrics. CDN data contains structural information about the target application being analyzed. Usually, CDN data is analyzed using different handcrafted network measures, like Social Network metrics. Our approach uses network embedding techniques to leverage CDN information without having to build the metrics manually. In order to use the embeddings between versions, we incorporate different embedding alignment techniques. To evaluate our approach, we performed experiments on 24 software release pairs and compared it against several benchmark methods. In these experiments, we analyzed the performance of two different graph embedding techniques, three anchor selection approaches, and two alignment techniques. We also built a meta-model based on two different embeddings and achieved a statistically significant improvement in AUC of 4.7% (p < 0.002) over the baseline method.
translated by 谷歌翻译
The United States coastline spans 95,471 miles; a distance that cannot be effectively patrolled or secured by manual human effort alone. Unmanned Aerial Vehicles (UAVs) equipped with infrared cameras and deep-learning based algorithms represent a more efficient alternative for identifying and segmenting objects of interest - namely, ships. However, standard approaches to training these algorithms require large-scale datasets of densely labeled infrared maritime images. Such datasets are not publicly available and manually annotating every pixel in a large-scale dataset would have an extreme labor cost. In this work we demonstrate that, in the context of segmenting ships in infrared imagery, weakly-supervising an algorithm with sparsely labeled data can drastically reduce data labeling costs with minimal impact on system performance. We apply weakly-supervised learning to an unlabeled dataset of 7055 infrared images sourced from the Naval Air Warfare Center Aircraft Division (NAWCAD). We find that by sparsely labeling only 32 points per image, weakly-supervised segmentation models can still effectively detect and segment ships, with a Jaccard score of up to 0.756.
translated by 谷歌翻译
Autonomous underwater vehicles (AUVs) are regularly used for deep ocean applications. Commonly, the autonomous navigation task is carried out by a fusion between two sensors: the inertial navigation system and the Doppler velocity log (DVL). The DVL operates by transmitting four acoustic beams to the sea floor, and once reflected back, the AUV velocity vector can be estimated. However, in real-life scenarios, such as an uneven seabed, sea creatures blocking the DVL's view and, roll/pitch maneuvers, the acoustic beams' reflection is resulting in a scenario known as DVL outage. Consequently, a velocity update is not available to bind the inertial solution drift. To cope with such situations, in this paper, we leverage our BeamsNet framework and propose a Set-Transformer-based BeamsNet (ST-BeamsNet) that utilizes inertial data readings and previous DVL velocity measurements to regress the current AUV velocity in case of a complete DVL outage. The proposed approach was evaluated using data from experiments held in the Mediterranean Sea with the Snapir AUV and was compared to a moving average (MA) estimator. Our ST-BeamsNet estimated the AUV velocity vector with an 8.547% speed error, which is 26% better than the MA approach.
translated by 谷歌翻译
We consider a long-term average profit maximizing admission control problem in an M/M/1 queuing system with a known arrival rate but an unknown service rate. With a fixed reward collected upon service completion and a cost per unit of time enforced on customers waiting in the queue, a dispatcher decides upon arrivals whether to admit the arriving customer or not based on the full history of observations of the queue-length of the system. \cite[Econometrica]{Naor} showed that if all the parameters of the model are known, then it is optimal to use a static threshold policy - admit if the queue-length is less than a predetermined threshold and otherwise not. We propose a learning-based dispatching algorithm and characterize its regret with respect to optimal dispatch policies for the full information model of \cite{Naor}. We show that the algorithm achieves an $O(1)$ regret when all optimal thresholds with full information are non-zero, and achieves an $O(\ln^{3+\epsilon}(N))$ regret in the case that an optimal threshold with full information is $0$ (i.e., an optimal policy is to reject all arrivals), where $N$ is the number of arrivals and $\epsilon>0$.
translated by 谷歌翻译
Contrastive learning has been successfully used for retrieval of semantically aligned sentences, but it often requires large batch sizes or careful engineering to work well. In this paper, we instead propose a generative model for learning multilingual text embeddings which can be used to retrieve or score sentence pairs. Our model operates on parallel data in $N$ languages and, through an approximation we introduce, efficiently encourages source separation in this multilingual setting, separating semantic information that is shared between translations from stylistic or language-specific variation. We show careful large-scale comparisons between contrastive and generation-based approaches for learning multilingual text embeddings, a comparison that has not been done to the best of our knowledge despite the popularity of these approaches. We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval -- the last of which we introduce in this paper. Overall, our Variational Multilingual Source-Separation Transformer (VMSST) model outperforms both a strong contrastive and generative baseline on these tasks.
translated by 谷歌翻译
We present a human-in-the-loop evaluation framework for fact-checking novel misinformation claims and identifying social media messages that violate relevant policies. Our approach extracts structured representations of check-worthy claims, which are aggregated and ranked for review. Stance classifiers are then used to identify tweets supporting novel misinformation claims, which are further reviewed to determine whether they violate relevant policies. To demonstrate the feasibility of our approach, we develop a baseline system based on modern NLP methods for human-in-the-loop fact-checking in the domain of COVID-19 treatments. Using our baseline system, we show that human fact-checkers can identify 124 tweets per hour that violate Twitter's policies on COVID-19 misinformation. We will make our code, data, and detailed annotation guidelines available to support the evaluation of human-in-the-loop systems that identify novel misinformation directly from raw user-generated content.
translated by 谷歌翻译
As language models (LMs) scale, they develop many novel behaviors, good and bad, exacerbating the need to evaluate how they behave. Prior work creates evaluations with crowdwork (which is time-consuming and expensive) or existing data sources (which are not always available). Here, we automatically generate evaluations with LMs. We explore approaches with varying amounts of human effort, from instructing LMs to write yes/no questions to making complex Winogender schemas with multiple stages of LM-based generation and filtering. Crowdworkers rate the examples as highly relevant and agree with 90-100% of labels, sometimes more so than corresponding human-written datasets. We generate 154 datasets and discover new cases of inverse scaling where LMs get worse with size. Larger LMs repeat back a dialog user's preferred answer ("sycophancy") and express greater desire to pursue concerning goals like resource acquisition and goal preservation. We also find some of the first examples of inverse scaling in RL from Human Feedback (RLHF), where more RLHF makes LMs worse. For example, RLHF makes LMs express stronger political views (on gun rights and immigration) and a greater desire to avoid shut down. Overall, LM-written evaluations are high-quality and let us quickly discover many novel LM behaviors.
translated by 谷歌翻译